Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction
نویسندگان
چکیده
Due to the nature of complex NLP problems, structured prediction algorithms have been important modeling tools for a wide range of tasks. While there exists evidence showing that linear Structural Support Vector Machine (SSVM) algorithm performs better than structured Perceptron, the SSVM algorithm is still less frequently chosen in the NLP community because of its relatively slow training speed. In this paper, we propose a fast and easy-toimplement dual coordinate descent algorithm for SSVMs. Unlike algorithms such as Perceptron and stochastic gradient descent, our method keeps track of dual variables and updates the weight vector more aggressively. As a result, this training process is as efficient as existing online learning methods, and yet derives consistently better models, as evaluated on four benchmark NLP datasets for part-ofspeech tagging, named-entity recognition and dependency parsing.
منابع مشابه
A Sequential Dual Method for Structural SVMs
In many real world prediction problems the output is a structured object like a sequence or a tree or a graph. Such problems range from natural language processing to computational biology or computer vision and have been tackled using algorithms, referred to as structured output learning algorithms. We consider the problem of structured classification. In the last few years, large margin class...
متن کاملExponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, whe...
متن کاملEfficient Methods for Learning and Inference in Structured Output Prediction
Binary classification has dominated machine learning for decades. However, many modern applications of machine learning require modeling complex high-dimensional outputs. Although the output space in such problems is typically very large, it often has intrinsic structure which can be exploited to construct efficient algorithms. Indeed, in recent years structured output prediction has shown stat...
متن کاملAccelerating Stochastic Gradient Descent using Predictive Variance Reduction
Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast conv...
متن کاملPenalized Bregman Divergence Estimation via Coordinate Descent
Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- TACL
دوره 1 شماره
صفحات -
تاریخ انتشار 2013